Training selection for tuning entity matching
نویسندگان
چکیده
Entity matching is a crucial and difficult task for data integration. An effective solution strategy typically has to combine several techniques and to find suitable settings for critical configuration parameters such as similarity thresholds. Supervised (trainingbased) approaches promise to reduce the manual work for determining (learning) effective strategies for entity matching. However, they critically depend on training data selection which is a difficult problem that has so far mostly been addressed manually by human experts. In this paper we propose a trainingbased framework called STEM for entity matching and present different generic methods for automatically selecting training data to combine and configure several matching techniques. We evaluate the proposed methods for different match tasks and smalland medium-sized training sets.
منابع مشابه
Efficient Selection of Mappings and Automatic Quality-driven Combination of Matching Methods
The AgreementMaker system for ontology matching includes an extensible architecture that facilitates the integration and performance tuning of a variety of matching methods, an evaluation mechanism, which can make use of a reference matching or rely solely on “inherent” quality measures, and a multi-purpose user interface, which drives both the matching methods and the evaluation strategies. In...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کاملA Differential Evolution and Spatial Distribution based Local Search for Training Fuzzy Wavelet Neural Network
Abstract Many parameter-tuning algorithms have been proposed for training Fuzzy Wavelet Neural Networks (FWNNs). Absence of appropriate structure, convergence to local optima and low speed in learning algorithms are deficiencies of FWNNs in previous studies. In this paper, a Memetic Algorithm (MA) is introduced to train FWNN for addressing aforementioned learning lacks. Differential Evolution...
متن کاملEntity Matching for Intelligent Information Integration
Due to the rapid development of information technologies, especially the network technologies, business activities have never been as integrated as they are now. Business decision making often requires gathering information from different sources. This dissertation focuses on the problem of entity matching, associating corresponding information elements within or across information systems. It ...
متن کاملDiscriminative data selection for lightly supervised training of acoustic model using closed caption texts
We present a novel data selection method for lightly supervised training of acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008